Search CORE

21 research outputs found

A Simple Algorithm for Estimating Distribution Parameters from $n$ -Dimensional Randomized Binary Responses

Author: Vinterbo Staal A.
Publication venue
Publication date: 01/01/2018
Field of study

Randomized response is attractive for privacy preserving data collection because the provided privacy can be quantified by means such as differential privacy. However, recovering and analyzing statistics involving multiple dependent randomized binary attributes can be difficult, posing a significant barrier to use. In this work, we address this problem by identifying and analyzing a family of response randomizers that change each binary attribute independently with the same probability. Modes of Google's Rappor randomizer as well as applications of two well-known classical randomized response methods, Warner's original method and Simmons' unrelated question method, belong to this family. We show that randomizers in this family transform multinomial distribution parameters by an iterated Kronecker product of an invertible and bisymmetric

2 \times 2

matrix. This allows us to present a simple and efficient algorithm for obtaining unbiased maximum likelihood parameter estimates for

k

-way marginals from randomized responses and provide theoretical bounds on the statistical efficiency achieved. We also describe the efficiency - differential privacy tradeoff. Importantly, both randomization of responses and the estimation algorithm are simple to implement, an aspect critical to technologies for privacy protection and security.Comment: Accepted at Information Security - 21th International Conference, ISC 2018. Adapted to meet article length requirements. Fixed typo. Results unchange

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Approximation properties of haplotype tagging

Author: Dreiseitl Stephan
Ohno-Machado Lucila
Vinterbo Staal A
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties. RESULTS: It is shown that the tagging problem is NP-hard but approximable within 1 + ln((n(2 )- n)/2) for n haplotypes but not approximable within (1 - ε) ln(n/2) for any ε > 0 unless NP ⊂ DTIME(n(log log n)). A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time O([Image: see text] (2m - p + 1)) ≤ O(m(n(2 )- n)/2) where p ≤ min(n, m) for n haplotypes of size m. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound. CONCLUSION: The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel

Springer - Publisher Connector

PubMed Central

Perceptions of molecular epidemiology studies of HIV among stakeholders

Author: Hoenigl Martin
Kalichman Michael
Little Susan
Mehta Sanjay R.
Schairer Cynthia
Vinterbo Staal A.
Publication venue: 'PAGEPress Publications'
Publication date: 01/12/2017
Field of study

Background: Advances in viral sequence analysis make it possible to track the spread of infectious pathogens, such as HIV, within a population. When used to study HIV, these analyses (i.e., molecular epidemiology) potentially allow inference of the identity of individual research subjects. Current privacy standards are likely insufficient for this type of public health research. To address this challenge, it will be important to understand how stakeholders feel about the benefits and risks of such research. Design and Methods: To better understand perceived benefits and risks of these research methods, in-depth qualitative interviews were conducted with HIV-infected individuals, individuals at high-risk for contracting HIV, and professionals in HIV care and prevention. To gather additional perspectives, attendees to a public lecture on molecular epidemiology were asked to complete an informal questionnaire. Results: Among those interviewed and polled, there was near unanimous support for using molecular epidemiology to study HIV. Questionnaires showed strong agreement about benefits of molecular epidemiology, but diverse attitudes regarding risks. Interviewees acknowledged several risks, including privacy breaches and provocation of anti-gay sentiment. The interviews also demonstrated a possibility that misunderstandings about molecular epidemiology may affect how risks and benefits are evaluated. Conclusions: While nearly all study participants agree that the benefits of HIV molecular epidemiology outweigh the risks, concerns about privacy must be addressed to ensure continued trust in research institutions and willingness to participate in research

Directory of Open Access Journals

eScholarship - University of California

Journal of Public Health Research (PAGEPress Publications)

Differential privacy for symmetric log-concave mechanisms

Author: Vinterbo Staal A.
Publication venue
Publication date: 02/07/2022
Field of study

Adding random noise to database query results is an important tool for achieving privacy. A challenge is to minimize this noise while still meeting privacy requirements. Recently, a sufficient and necessary condition for

(\epsilon, \delta)

-differential privacy for Gaussian noise was published. This condition allows the computation of the minimum privacy-preserving scale for this distribution. We extend this work and provide a sufficient and necessary condition for

(\epsilon, \delta)

-differential privacy for all symmetric and log-concave noise densities. Our results allow fine-grained tailoring of the noise distribution to the dimensionality of the query result. We demonstrate that this can yield significantly lower mean squared errors than those incurred by the currently used Laplace and Gaussian mechanisms for the same

\epsilon

and

\delta

.Comment: AISTATS 2022, v2 corrects typo

arXiv.org e-Print Archive

A Note on the Hardness of the k-Ambiguity Problem

Author: Staal A. Vinterbo
Publication venue
Publication date
Field of study

We address the problem of minimal information loss in order to k-ambiguate data, a problem related to disclosure control in disseminated data. We show that this problem is NP-hard by considering cell suppression as the ambiguation mechanism. On the way we prove that the minimum k-union problem (aka. minimum k-coverage, aka. maximum k-intersection), which is the problem of selecting k sets from a collection of n sets such that the cardinality of their union is the minimum, is NP-hard. Shown is also that if the cardinality of the sets in the collection is bounded by a constant, this restricted problem is in APX

CiteSeerX

Spectral Anonymization of Data

Author: Staal A. Vinterbo
Thomas A. Lasko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref